AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal speech recognition

# Multimodal speech recognition

Gemma 3 4b It Speech
Gemma-3-MM is a multimodal instruction model extended from Gemma-3-4b-it with added speech processing capabilities, capable of handling text, image, and audio inputs to generate text outputs.
Audio-to-Text Transformers
G
junnei
383
12
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase